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In this note, we revisit the algorithm of Har-Peled et al. [HRZ07] for computing a linear maxi¬ 
mum margin classifier. Our presentation is self contained, and the algorithm itself is slightly simpler 
than the original algorithm. The algorithm itself is a simple Perceptron like iterative algorithm. 
For more details and background, the reader is referred to the original paper. 


1. Active learning, sparsity and large margin 

Let P be a point set of n points in M d . Every point has a label/color (say black or white), but we do not 
know the labels. In particular, let B and W be the set of black and white points in P. Furthermore, let 
A = diam(P), and assume that there exist two parallel hyperplanes h, h! in distance 7 from each other, 
such that the slab between h and h! does not contain an point of P , and the points of B are on one side 
of this slab, and the points of W are on the other side. The quantity 7 is the margin of P. 

A somewhat more convenient way to handle such slabs, is to consider two points b and w in M d . 
Let slab(b.w) be the region of points in M d , such that their projection onto the line spanned by b 
and w is contained in the open segment bw. We use (1 — e)slab(b,w) to denote the slab formed from 
slab(b, w) by shrinking it by a factor of (1 — e) around its middle hyperplane. Formally, it is defined as 
(1 — e)slab(b, w) = slab(b / , w'), where b' = (1 — £/2)b + (e:/2)w and w' = (e/2)b + (1 — e/2)w. 

In the following, we assume have an access to a labeling oracle that can return the label of a 
specific query point. Similarly, we assume access to a counterexample oracle , such that given a slab 
that does not contain any points of P in its interior, and supposedly separates the points of P into B 
and W, it returns a point that is mislabeled by this classifier (i.e., slab) if such a point exists. 

Conceptually, asking queries from the oracles is quite expensive, and the algorithm tries to minimize 
the number of such queries. 

The algorithm. Assume there are two points bi G B and wj € W. For i > 0, in the zth iteration, the 
algorithm considers the slab Si = (1 — £)slab(b.;, wj. There are two possibilities: 

(A) If the slab S t contains no points of P, then the algorithm uses the counterexample oracle to check 
if it is done - that is, all the points are classified correctly. Otherwise, a badly classified point p t 
was returned. 

* Department of Computer Science; University of Illinois; 201 N. Goodwin Avenue; Urbana, IL, 61801, USA; 
sariel@illinois.edu; http://sarielhp.org/. Work on this paper was partially supported by a NSF AF awards 
CCF-1421231, and CCF-1217462. 


1 



(B) The Si contains some points of P, and let Pi be the closest point to the middle hyperplane of the 
slab The algorithm uses the labeling oracle to get the label of p t . 

Assume that the label of pi is white. Then, the algorithm set Wj + i be the projection of b t to Wjpj, and 
b i+1 = bj (the case that Pi is black is handled in a symmetric fashion). 

Lemma 1.1 ([HRZ07]). Let P be a set of points in M. d , with diameter A. Assume there is an unknown 
partition of P into two (unknown) point sets B and W, of white and black points, respectively, and this 
partition has margin 7 . Furthermore, we are given an access to a labeling and counterexample oracles. 
Finally, there are two given points bi G B and wjgW. 

Then, for any £ > 0, one can compute using an iterative algorithm, in I = 0^(A/y) 2 /£ 2 j iterations 

and in O(Idn) time, a slab of width > (1 — 5)7 that separates B from W. This algorithm performs I 
calls to the labeling/counterexample oracles. 


Proof: Our purpose is to analyze the number of iterations of this algorithm till it terminates. So, let 
C = || bj — Wi||. Clearly, A > £ 0 > i\ > • • • > 7 , the last step follows as bj G Cb and Wj G Cw, and the 
distance g?(Cb, Cw) > 7, where d(X, Y ) = min xe x min ye y ||x — y ||. 


Let pt be the projection of p t to the line spanned by Wjbj. Observe that 
if pi G Si then ||p' — Wj|| > &C/2. Formally, the points Wj breaks the line 
spanned by w, and bj into two parts, and bj and pt are on the same side, 
and p[ is distance at least C/2 away from Wj along this ray. Observe that 
if case (B) above happened, then pi is not inside Si, and this distance is 
significantly larger. 
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Setting a = ZpjWjbj, we have cos a 
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. As such, we have 


C+i = C sin a < C 
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We have that C+fc A C/2, for k 


64A 2 / (eC ) 2 


Indeed, if C+fc > C/2, then 



which is a contradiction. 

In particular, the jth epoch of the algorithm are the iterations where C G [A/2 J,_1 , A/2 J ]. Namely, 
during an epoch the width of the current slab shrinks by a factor of two. By Eq. (1.3), the jth epoch 

lasts nj = 0^(2 J /e) 2 ^ iterations. As such, the total number of iterations JA n 3 is dominated by the last 

epoch, that starts (roughly) when C A 2y, and end when it hits 7 . This last epoch takes O^A 2 /(e 7 ) 2 j 
iterations, which also bounds the total number of iterations. ■ 


Remark. (A) if the data is already labeled, then the algorithm of Lemma 1.1 can be implemented directly 
resulting in the same running time as stated. This algorithm approximates the maximum margin 
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classifier to the data. Specifically, the above algorithm (1 + e)-approximates the distance d(B, W), and 
it can be interpreted as an approximation algorithm for the associated quadratic program. 

(B) One can implement the counterexample oracle, by sampling enough labels, and using the labeling 
oracle. This is introduces a certain level of error. See [HRZ07] for details. 
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